AITopics | representation change

Collaborating Authors

representation change

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training

Neural Information Processing SystemsFeb-7-2026, 08:54:51 GMT

Learning Rate Warmup is a popular heuristic for training neural networks, especially at larger batch sizes, despite limited understanding of its benefits.

large language model, machine learning, warmup, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

059445c2d5b3ef918079851628fef1d6-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 17:34:55 GMT

gradient, update size, warmup, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training

Kosson, Atli, Messmer, Bettina, Jaggi, Martin

arXiv.org Artificial IntelligenceOct-31-2024

Learning Rate Warmup is a popular heuristic for training neural networks, especially at larger batch sizes, despite limited understanding of its benefits. Warmup decreases the update size $\Delta \mathbf{w}_t = \eta_t \mathbf{u}_t$ early in training by using lower values for the learning rate $\eta_t$. In this work we argue that warmup benefits training by keeping the overall size of $\Delta \mathbf{w}_t$ limited, counteracting large initial values of $\mathbf{u}_t$. Focusing on small-scale GPT training with AdamW/Lion, we explore the following question: Why and by which criteria are early updates $\mathbf{u}_t$ too large? We analyze different metrics for the update size including the $\ell_2$-norm, resulting directional change, and impact on the representations of the network, providing a new perspective on warmup. In particular, we find that warmup helps counteract large angular updates as well as a limited critical batch size early in training. Finally, we show that the need for warmup can be significantly reduced or eliminated by modifying the optimizer to explicitly normalize $\mathbf{u}_t$ based on the aforementioned metrics.

arxiv, update size, warmup, (13 more...)

arXiv.org Artificial Intelligence

2410.23922

Country:

North America > United States (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Similarity of Pre-trained and Fine-tuned Representations

Goerttler, Thomas, Obermayer, Klaus

arXiv.org Artificial IntelligenceJul-19-2022

However, Representation similarity analysis shows that the Oh et al. (2021) found out that, especially in the case of most significant change still occurs in the head cross-domain adaption, where the fine-tuning task does not even if all weights are updatable. However, recent come from the same distribution as in training, also an adaptation results from few-shot learning have shown that of earlier layers is very beneficial. Neyshabur et al. representation change in the early layers, which (2020) investigated what is transferred in transfer learning are mostly convolutional, is beneficial, especially by shuffling the blocks of inputs. They confirmed that lower in the case of cross-domain adaption. In our paper, layers are responsible for more general features and that a we find out whether that also holds true for transfer network with pre-trained weights stays in the same basin of learning. In addition, we analyze the change solution during fine-tuning. of representation in transfer learning, both during pre-training and fine-tuning, and find out that This paper analyses representation obtained by models having pre-trained structure is unlearned if not usable.

cifar-10, representation, similarity, (14 more...)

arXiv.org Artificial Intelligence

2207.09225

Country:

Europe > Germany > Berlin (0.05)
Europe > Austria (0.05)
North America > United States > Maryland (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Does MAML Only Work via Feature Re-use? A Data Centric Perspective

Miranda, Brando, Wang, Yu-Xiong, Koyejo, Sanmi

arXiv.org Artificial IntelligenceDec-24-2021

Recent work has suggested that a good embedding is all we need to solve many few-shot learning benchmarks. Furthermore, other work has strongly suggested that Model Agnostic Meta-Learning (MAML) also works via this same method - by learning a good embedding. These observations highlight our lack of understanding of what meta-learning algorithms are doing and when they work. In this work, we provide empirical results that shed some light on how meta-learned MAML representations function. In particular, we identify three interesting properties: 1) In contrast to previous work, we show that it is possible to define a family of synthetic benchmarks that result in a low degree of feature re-use - suggesting that current few-shot learning benchmarks might not have the properties needed for the success of meta-learning algorithms; 2) meta-overfitting occurs when the number of classes (or concepts) are finite, and this issue disappears once the task has an unbounded number of concepts (e.g., online learning); 3) more adaptation at meta-test time with MAML does not necessarily result in a significant representation change or even an improvement in meta-test performance - even when training on our proposed synthetic benchmarks. Finally, we suggest that to understand meta-learning algorithms better, we must go beyond tracking only absolute performance and, in addition, formally quantify the degree of meta-learning and track both metrics together. Reporting results in future work this way will help us identify the sources of meta-overfitting more accurately and help us design more flexible meta-learning algorithms that learn beyond fixed feature re-use. Finally, we conjecture the core challenge of re-thinking meta-learning is in the design of few-shot learning data sets and benchmarks - rather than in the algorithms, as suggested by previous work.

benchmark, feature re-use, maml, (15 more...)

arXiv.org Artificial Intelligence

2112.13137

Country:

North America > United States > Illinois > Champaign County > Urbana (0.14)
Europe > France (0.04)

Genre: Research Report (1.00)

Industry: Education > Educational Setting (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)

Add feedback

Features, Projections, and Representation Change for Generalized Planning

Bonet, Blai, Geffner, Hector

arXiv.org Artificial IntelligenceMay-15-2018

Generalized planning is concerned with the characterization and computation of plans that solve many instances at once. In the standard formulation, a generalized plan is a mapping from feature or observation histories into actions, assuming that the instances share a common pool of features and actions. This assumption, however, excludes the standard relational planning domains where actions and objects change across instances. In this work, we extend the standard formulation of generalized planning to such domains. This is achieved by projecting the actions over the features, resulting in a common set of abstract actions which can be tested for soundness and completeness, and which can be used for generating general policies such as "if the gripper is empty, pick the clear block above x and place it on the table" that achieve the goal clear(x) in any Blocksworld instance. In this policy, "pick the clear block above x" is an abstract action that may represent the action Unstack(a, b) in one situation and the action Unstack(b, c) in another. Transformations are also introduced for computing such policies by means of fully observable non-deterministic (FOND) planners. The value of generalized representations for learning general policies is also discussed.

abstract action, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1801.10055

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Meta-Search Through the Space of Representations and Heuristics on a Problem by Problem Basis

Fuentetaja, Raquel (Universidad Carlos III de Madrid) | Barley, Michael (University of Auckland) | Borrajo, Daniel (Universidad Carlos III de Madrid) | Douglas, Jordan (University of Auckland) | Franco, Santiago (University of Huddersfield) | Riddle, Patricia (University of Auckland)

AAAI ConferencesFeb-8-2018

Two key aspects of problem solving are representation and search heuristics. Both theoretical and experimental studies have shown that there is no one best problem representation nor one best search heuristic. Therefore, some recent methods, e.g., portfolios, learn a good combination of problem solvers to be used in a given domain or set of domains. There are even dynamic portfolios that select a particular combination of problem solvers specific to a problem. These approaches: (1) need to perform a learning step; (2) do not usually focus on changing the representation of the input domain/problem; and (3) frequently do not adapt the portfolio to the specific problem. This paper describes a meta-reasoning system that searches through the space of combinations of representations and heuristics to find one suitable for optimally solving the specific problem. We show that this approach can be better than selecting a combination to use for all problems within a domain and is competitive with state of the art optimal planners.

meta-search state, representation, representation change, (17 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

AI and Consciousness: Theoretical Foundations and Current Approaches

AI MagazineJan-4-2018, 07:07:59 GMT

The Association for the Advancement of Artificial Intelligence presented the 2007 Fall Symposium Series on Friday through Sunday, November 9-11, at the Westin Arlington Gateway, Arlington, Virginia. The titles of the seven symposia were (1) AI and Consciousness: Theoretical Foundations and Current Approaches, (2) Artificial Intelligence for Prognostics, (3) Cognitive Approaches to Natural Language Processing, (4) Computational Approaches to Representation Change during Learning and Development, (5) Emergent Agents and Socialities: Social and Organizational Aspects of Intelligence, (6) Intelligent Narrative Technologies, and (7) Regarding the "Intelligence" in Distributed Intelligent Systems. Is it possible to build a conscious machine? Is trying to design and build a conscious machine helpful to understanding the nature of consciousness? These questions have been at the core of AI since its beginnings.

artificial intelligence, health & medicine, symposium, (15 more...)

AI Magazine

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback